Accelerated gradient descent

Initialize starting vector $\mathbf{v}^{(0)} = \mathbf{y}^{(1)} = \mathbf{z}^{(1)}$ . For $t = 1,...,T$ , compute

$\mathbf{y}^{(t+1)} = \mathbf{x}^{(t)}-\frac{1}{\beta}\nabla f(\mathbf{x}^{(t)})$
$\mathbf{x}^{(t+1)} = \left( 1 + \frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1} \right)\mathbf{y}^{(t+1)} + \frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\left(\mathbf{y}^{(t+1)} - \mathbf{y}^{(t)}\right)$

Let $f$ be alpha-strongly convex and beta-smooth, then, running accelerated gradient descent for $T$ steps, output $\hat{x}$ satisfies, with $\kappa = \frac{\beta}{\alpha}$ ,

$f(\mathbf{x}^{(T)}) - f(\mathbf{x}^*) \leq \kappa e^{-T/\sqrt{\kappa}}[f(\mathbf{x}^{(0)})-f(\mathbf{x}^*)]$

Complexity analysis:

Suppose $f$ strongly convex such that $mI \preceq \nabla^2 f(x) \preceq MI, \quad \text{for all } x$ Recall that setting $t = 1/M$ , $f(x^{(k)}) - p^* \leq (1- \frac{1}{\kappa})^k (f(x^{(0)} - p^*)$ where $\kappa = M/m$ , and $p^*$ is the minimum value of $f$ .

#incomplete

References:

https://web.archive.org/web/20210302210908/https://blogs.princeton.edu/imabandit/2013/04/01/acceleratedgradientdescent/
https://www.chrismusco.com/amlds2023/lectures/lec8_annotated.pdf
https://www.chrismusco.com/amlds2023/notes/lecture08.html#Accelerated_Gradient_Descent
Yurii Nesterov - Introductory Lectures on Convex Optimization, pgs. 66-68, 81. https://link.springer.com/book/10.1007/978-1-4419-8853-9
Yurii Nesterov - Lectures on Convex Optimization